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As both a central task in Remote Sensing and a common prob¬ 
lem in many other situations involving time series data, change point 
detection boasts a thorough and well-documented history of study. 

However, the treatment of missing data and proper exploitation of 
the structure in multivariate time series during change point detection 
remains lacking. Multispectral, high temporal resolution time series 
data from NASA’s Moderate Resolution Imaging Spectroradiometer 
(MODIS) instruments provide an attractive and challenging context 
to contribute to the change point detection literature. In an effort to 
better monitor change in land cover using MODIS data, we present a 
novel approach to identifying periods of time in which regions experi¬ 
ence some conversion-type of land cover change. That is, we propose 
a method for parameter estimation and change point detection in the 
presence of missing data which capitalizes on the high dimensionality 
of MODIS data. We test the quality of our method in a simulation 
study alongside a contemporary change point method and apply it 
in a case study at the Xingu River Basin in the Amazon. Not only 
does our method maintain a high accuracy, but can provide insight 
into the types of changes occurring via land cover conversion proba¬ 
bilities. In this way we can better characterize the amount and types 
of forest disturbance in our study area in comparison to traditional 
change point methods. 


1. Introduction. To enhance and inform Earth system models, timely and accurate 
monitoring of land cover must be maintained (Bonan et ah, 2002; Ek et ah, 2003; Running 
and Goughian, 1988; Sterling and Ducharne, 2008). Additionally, because the land area 
affected by humans has expanded rapidly (Ellis and Ramankutty, 2008; Goldewijk, 2001; 
Ramankutty and Foley, 1999; Sanderson et ah, 2002; Vitousek et ah, 1997) and society 
depends to a large extent on terrestrial ecosystems (Foley et ah, 2005), high quality infor¬ 
mation regarding changes in land cover is crucial for modern land-use policy and natural 
resource management. 

Remote sensing instruments onboard various satellite platforms have been providing re¬ 
peated observation of the Earth’s surface, enabling continuous mapping and monitoring of 
land cover change, especially those caused by human activities. With continuous missions, 
some instrument series have observations over the past few decades (e.g., the Landsat series, 
the Advanced Very High Resolution Radiometer (AVHRR) series). A unique sensor named 
the Moderate Resolution Imaging Spectroradiometer (MODIS), has been in orbit onboard 
NASA’s Terra and Aqua satellites since the early 2000s. This instrument strikes a balance 
between moderate spatial resolution (250-500 meters) and high revisit capability, providing 
time series observations for over a decade. However, a host of issues plagues MODIS data 
such as measurement errors, atmospheric contamination, and variable view geometry and 
gridding artifacts (Roy, 2000; Huang et ah, 2002; Tan et ah, 2006), and renders change 
detection a challenging task due to missing and noisy data. 
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Various change detection techniques were developed using bi-temporal or multi-temporal 
imagery for mapping changes including deforestation, forest mortality, and urban expansion 
(see (Singh, 1989; Rogan et ah, 2002; Coppin et ah, 2004; Lu et ah, 2004)). As MODIS time 
series grow, more studies have focused on better exploitation of the temporal information 
in MODIS data for change detection, e.g. (Verbesselt et ah, 2010; Rahman et ah, 2013; 
Huang and Friedl, 2014). However, due to the volume of data and nature of optical remote 
sensing (susceptible to cloud and atmospheric contamination), it remains challenging to pre- 
process and fully utilize the time series data. Thus, there is great need of methods that (i) 
better address missing data; that (ii) explore the rich structure in the data in their spectral, 
temporal, and spatial dimensions; and that (iii) are robust to noise. 

Most existing methods for change detection in the presence of missing data attempt 
to impute or estimate missing data first and then proceed to identify changes (Lunetta 
et ah, 1999, 2006; Boriah, 2010). Estimation can proceed in a number of ways, including, 
for example, nearest neighbor interpolation (Ning and Cheng, 2012; Zhang, 2012; Jerez 
et ah, 2010) or linear, polynomial, or spline interpolation (Junninen et ah, 2004). Missing 
values can be imputed using multiple imputation (Honaker and King, 2010) or expectation- 
maximization (EM) (Dempster et ah, 1977) (for a thorough review of handling missing data 
in statistical analyses, see (Little and Rubin, 2002).) However, since missing data are often 
handled separately from and prior to change point estimation, the imputation does not 
account for possible large changes and so the resulting change detection can lack statistical 
power. 

In this paper we introduce and assess a novel, off-line change point detection model that 
is tailored to the data characteristics of MODIS time series, i.e. large and structured. Our 
key contribution is to characterize change as transitions in land cover: we assume that 
the region of study is reasonably homogeneous, with a predominant “background” land 
cover class, and we evaluate change by implicitly classifying land cover and contrasting 
estimated classes to the background. This way, we can not only detect changes but also 
understand their nature; for instance, we can better assess if native forest was burned, 
logged, or converted to cropland. By exploiting land cover information from training data, 
we specify a Bayesian hierarchical model to detect distributional, conversion-type changes 
in multispectral time series while accounting for missing data. In addition, as opposed to 
at-most-one-change (AMOC) models that aim at detecting single abrupt disruptions, our 
formulation allows for at most two change points and thus also considers possible recovery 
from prior disturbances. We describe the change point detection model in Section 2, and 
we apply and evaluate our model using a simulation study (Section 3.1) and a case study 
(Section 3.2). 

1.1. Data Description. To illustrate the main issues that afflict MODIS data, here we 
describe the dataset that is used in the case study of Section 3.2. We use the MODIS 500 me¬ 
ter Nadir BRDF-adjusted Reflectance (NBAR) product, which is designed to minimize noise 
due to bidirectional reflectance effects arising from varying solar and view geometry (Schaaf 
et ah, 2002). This product features seven spectral bands designed for land observation, 
covering visible to shortwave infrared wavelengths (Survey, 2013). 

For each pixel in the region of interest and for each year in the dataset—from 2001 to 
2010—we originally obtained time series of 46 NBAR composite values for seven spectral 
bands. However, for our analysis we select a temporal subset of 19 observations per year 
(May to September) in order to exclude the wet season and reduce the proportion of missing 
data. Here it is essential to treat years as the main temporal unit to keep seasonality effects, 
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Fig 1. Spectral-temporal profiles for two representative pixels in a study area, bands 1 (black), 5 (red), and 7 
(green.) Gray horizontal lines mark missing data in at least one spectral band. Reflectance values have been 
multiplied by 10000. 


including phenology, that characterize land cover classes. We have verified that this subset 
still keeps enough seasonality within the year to distinguish well between classes. 

As an example, consider the spectro-temporal profiles for two representative pixels in 
Figure 1. To avoid overcrowding the plot, we only show three spectral bands (1, 5, and 7.) 
Gray bands mark missing data locations in at least one band. As we can see, most years 
have at least one time with missing values, and so discarding whole years is unfeasible. 
Moreover, since missing data happens more frequently at the end of our annual time series 
(i.e. start of wet season), it makes it harder to spot change between years. Some changes 
are more evident, as shown in the left plot at year 6, but some are harder to flag and can be 
attributed to minor disturbances, as in the right plot, at year 5. The right plot also highlights 
the possibility of recovery: the data profile seems to have returned to its background land 
cover state after year 9. 

Land cover change detection requires a scheme of land cover classes which encompasses 
all major land cover types. We employ a carefully established set of land cover classes 
constructed under the International Geosphere-Biosphere Programme (IGBP) (Davis and 
Holmgren, 2000), as defined in Table 1. 

1.2. Prior and Related Work. Ghange point detection methods have been applied ex¬ 
tensively in various fields of environmental and climate monitoring, to problems such as 
rates of Tropical cyclone activity, precipitation and temperature trends, and fishery popu¬ 
lation regime change (Eisner et ah, 2000; Ghu and Zhao, 2004; Rodionov, 2005; Solow and 
Beet, 2005). Statistically, the general change point problem can be categorized into on-line 
(real time) (Fearnhead and Liu, 2007) and off-line (retrospective) frameworks. Additionally, 
approaches to change point detection typically involve specifying which types of change to 
look for. Previous methods for detecting change vary by the following change types: mean- 
type shifts (Shao and Zhang, 2010; Lund and Reeves, 2002), variance change (Galeano and 
Pena, 2007), or change in distribution (Basseville and Nikiforov, 1993; Lee, 2010; Tsay, 
1988; Song et ah, 2007; Gombay, 2008). Popular approaches include time series models, 
sequential testing, special forms of regression, and Bayesian techniques (Menzefricke, 1981; 
Booth and Smith, 1982; Stephens, 1994; Perreault et ah, 2000; Fearnhead, 2006). 

With continuous data collection and growing time series from the MODIS instruments, 
many studies in the remote sensing literature have put more emphasis on exploring temporal 
information for land cover change detection. Some of these methods detect change at the 
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Table 1 

Land cover class definitions within the International Geosphere-Biosphere Programme (IGBP.) 


Class Class name Description 

1 Evergreen Lands dominated by trees with a percent canopy cover >60% and height 

Needleleaf exceeding 2 meters. Almost all tree remain green all year. Canopy is never 

Forests without green foliage. 

2 Evergreen Lands dominated by trees with a percent canopy cover > 60% and height 

Broadleaf exceeding 2 meters. Almost all tree remain green all year. Canopy is never 

Forests without green foliage. 

3 Deciduous Lands dominated by trees with a percent canopy cover >60% and height 

Needleleaf exceeding 2 meters. Consists of seasonal needleleaf tree communities with 

Forests an annual cycle of leaf-on and leaf-off periods. 

4 Deciduous Lands dominated by trees with a percent canopy cover >60% and height 

Broadleaf exceeding 2 meters. Consists of seasonal broadleaf tree communities with an 

Forests annual cycle of leaf-on and leaf-off periods. 

5 Mixed Forests Lands dominated by trees with a percent canopy cover >60% and height 

exceeding 2 meters. Consists of tree communities with interspersed mixtures 
or mosaics of the other four forest cover types. None of the forest types 
exceeds 60% of landscape. 

6 Closed Shrub- Lands with woody vegetation less than 2 meters tall and with shrub canopy 

lands cover is >60%. The shrub foliage can be either evergreen or deciduous. 

7 Open Shrub- Lands with woody vegetation less than 2 meters tall and with shrub canopy 

lands cover is 10-60%. The shrub foliage can be either evergreen or deciduous. 

8 Woody Savan- Lands with herbaceous and other understorey systems, and with forest 

nas canopy cover between 30-60%. The forest cover height exceeds 2 meters. 

9 Savannas Lands with herbaceous and other understorey systems, and with forest 

canopy cover between 10-30%. The forest cover height exceeds 2 meters. 

10 Grasslands Lands with herbaceous types of cover. Tree and shrub cover is less than 10%. 

11 Permanent Lands with a permanent mixture of water and herbaceous or woody vege- 

Wetlands tation that cover extensive areas. The vegetation can be present in either 

salt, brackish, or fresh water. 

12 Cropland Lands covered with temporary crops followed by harvest and a bare soil pe¬ 

riod (e.g. single and multiple cropping systems). Note that perennial woody 
crops will be classified as the appropriate forest or shrub land cover type. 

13 Urban and Lands covered by building and other man-made structures. 

Built-Up 

14 Cropland/Nat. Lands with a mosaic of croplands, forest, shrublands, and grasslands in 

Veg. Mosaics which no one component comprises more than 60% of the landscape. 

15 Snow and Ice Lands under snow and/or ice cover throughout the year. 

16 Barren Lands exposed soil, sand, rocks, or snow and never has more than 10% 

vegetated cover during any time of the year. 

17 Water Bodies Oceans, seas, lakes, reservoirs, and rivers. Can be either fresh or salt water 


bodies. 
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pixel level using change indices derived from annual time series (e.g., Linderman et ah, 2005; 
Mildrexler et ah, 2009; Coops et ah, 2009). Other studies developed temporal trajectory- 
based change detection algorithms such as temporal segmentation, structural break test, 
and distance-metric based methods (e.g., Verbesselt et ah, 2010; Sulla-Menashe et ah, 2013; 
Huang and Friedl, 2014). While some of these methods have demonstrated feasibility for 
large area application, it remains challenging to pre-process the data for gap-free input and 
reduce spurious detection of change due to noise. 

In this paper, we use the change detection method described in (Huang and Friedl, 2014) 
for comparison with our method. It is a distance metric-based change detection method 
for identifying changed pixels at annual time steps using 500 m MODIS NEAR time series 
data. The approach we describe uses distance metrics to measure (i) the similarity between 
a pixel’s annual time series to annual time series for pixels of the same land cover class, and 
(ii) the similarity between annual time series from different years at the same pixel. The 
combination of two distance metrics used both spatial (regional land cover related knowl¬ 
edge) and temporal information, and was shown to compare well with reference information 
derived from higher spatial resolution data. A set of essential pre-processing steps, including 
gap-filling, smoothing and temporal subsetting of MODIS 500 m NEAR time series, were 
also described as part of the approach. 

2. Model and Methods. Consider, for each year i = 1,..., J, and each pixel v in 
the region of interest the vector observation Xi^ containing data from B spectral bands 
and T within-year time points. For example, in the data described in Section 1.1, B = 7, 
T = 19, and J = 10. Since our data contain physical dimensions we exploit these features 
by partitioning the variation in the data into spectral and temporal components. Moreover, 
we expect land cover classes to have different mean profiles and different variances so we 
are able to distinguish them. Thus, if is the set of land cover classes and Wv G ^ codes 
for the land cover class of pixel v, we start by modeling the data using a matrix normal 
distribution (Dawid, 1981), or, equivalently, 

(1) \W, = g ~ N{pg, S, ® Dig), 

where (8> denotes the Kronecker product. That is, instead of assuming that our multivariate 
normal data have a single BT x BT covariance matrix we employ a Kronecker structured 
covariance matrix which isolates the spectral covariance in a B xB matrix, and the tem¬ 
poral covariance in a T x T matrix, Note that we assume that spectral variation (S^) 
transcends land cover class, and thus only allow the means (/r^) and temporal covariances 
i^tg) to vary with land cover class g. In this way, we reduce the dimensionality of param¬ 
eters to be estimated while keeping a parsimonious model structure (Glanz et ah, 2014). 
In addition, since the temporal profiles gg capture seasonality and temporal variability is 
represented in we do not need to explicitly model auto-correlation. 

The separable nature of the variance also has the advantage of allowing us to reduce the 
dimensionality of the data using a focused PCA compression. If = PDiag(Ai: b)P'^ is 
the eigen-decomposition of we select the K < B largest eigenvalues and, regarding Xi^ 
as a matrix with B rows, we define a compressed version of Xi^ as 

(2) X*, := Diag(Ai;i,)-'AT^Xi,. 

This transformation is equivalent to approximating S* using K eigenvectors, 'Eg ~ D* := 
Pi-KE>iag{Xi-K)Pi K^ decorrelating the columns of Xiy by S*. 
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Given the very large size of the data, we opt to learn land cover parameters fig, and 
Titg in a pre-processing step instead of jointly with change point estimation. To this end, 
we adopt the EM method proposed in (Glanz et ah, 2014) and apply it to an independent 
training dataset. This kind of prior elicitation is similar to empirical Bayes approaches (Gar- 
lin and Louis, 2000) and aims at simplifying the model and alleviating the computational 
burden of inference. To simplify the notation, for the remainder of this article we denote 
Tig = Tg ^ Ttg- 

While (1) gives a parametric model for the annual data at a pixel, we require a way to 
detect changes in land cover when these observations contain missing values. In pursuit of 
a change point year for each pixel, if it exists, we devise an EM algorithm which accounts 
for the missing data present throughout our region of interest. The following section details 
our hierarchical model and estimation procedure for identifying a change in land cover. 

2.1. Change Point Hierarchical Model and Parameter Estimation. In our scenario, the 
annual data for each pixel, Xi^, are assumed to be conditionally independent of both data 
in other years at pixel v as well as data and potential changes in other pixels. To model 
change, we allow the year sequence 1,...,J to be segmented according to p = {pi,p 2 ), 
1 < pi < P 2 < such that the segment pi -|- 1,..., p 2 is in the “change” state, and the pre- 
and post-change segments 1,..., pi and p 2 + , J are in the “background” state. This 

way, if p 2 < we have recovery from change to background. Lack of change is represented 
by Pi = P 2 = J, the only case when pi = p 2 , that is, for any other configuration we have 
Pi < P2- 

For each pixel v, we assume the data in the background segment, i.e. up to the change 
point year piy and after change point year p 2 v, follow a multivariate normal distribution 
with mean fio^, and the data in the change segment, i.e. from years piv -|- 1 to p 2 v, follow 
another multivariate normal distribution with mean pcv la addition, to accommodate more 
flexibility from pixel to pixel, we add a new level to our model and incorporate land cover 
class information via prior distributions for po« and pcv Specifically, we set conjugate priors 
poi; ~ Tp) where pp and Tp denote the mean and covariance of our background class, 

say Evergreen Broadleaf Forest (EBF); and pcv \ kEy = p ~ ^{Pgi Tg), where now Wy G ‘lo 
indicates the land cover class to which pixel v has transitioned in case of a change. The 
actual observations Xjy now spread around poy and pcv according to variance scales kq and 

tvQ. 

(3) Xjy I pov,Pcv,Pv ~ G BG{py))N{poy,KolBT) + lii 0 BG{py))N{p CVf Ibt), 

where /(•) is the indicator function, the background segment of py is BG(py) = {i : i < 
Ply or i > P 2 v}, and thus change positions i 0 BG(py) correspond to piy < i < p 2 y Since 
the change affects the mean yearly temporal profiles pQy and pcv, we can regard them as 
smoothed versions of Xiy and so this hierarchical model is similar in spirit to the smoothing 
approach of Lunetta et al. (2006). However, since our interest does not lie in the mean 
profile parameters poy and pcv, we can further simplify our model by marginalizing them 
out to obtain: 

Xiy I Py, Wy = p ~ I{i G BG(py))A^ ihF, Tp “L kqIbt) P ^ BG(py))A ^{hgi Tg + FcIbt)- 

As an example, Figure 2 depicts X^y for the two representative pixels that were shown 
in Figure 1, along with estimated poy, pcy, and py using the EM method described in 
Section 2.2. For the pixel on the left panel, 'piy = 6 and p 2 y = 11 (no recovery), while for 
the pixel on the right panel we have piy = 4 and p 2 y = 8. 
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Fig 2. Spectral-temporal profiles for two representative pixels in study area along with estimated mean profiles 
for background and change land cover classes. Hollow points mark EM-imputed values. Dashed lines during 
change periods represent mean profiles under background land cover class for comparison. Reflectance values 
have been multiplied by 10000. 


To set a weakly informative prior on we settle on a hierarchy that depends on two 
probabilities—the probability of a change occurring, vro, and, given that a change occurred, 
the probability of recovery tt/j— and we specify that configurations with the same number of 
change points are equally likely. Thus, the probabilities of no change, change without recov¬ 
ery (one change point), and change with recovery (two change points) are given, respectively, 
by 


^(/Oli; = P2v = J) = ^- VTo, 

( 4 ) ^{piv < P2v = J) = 

nPlv < P2V <J) = 

Finally, we set | Q ~ MN(1, o:) to depend on a region-wise parameter a that tells the a 
priori probability of changing to a certain class in fa, and elect a conjugate prior a ~ Dir(7r). 
The specification of tt provides an advantageous flexibility that we can exploit to inform 
the model of land cover classes we anticipate seeing after a change has occurred, making 
our approach particularly well suited for changes in the form of land cover conversions. 

Our model can accommodate changes in mean or covariance and benehts from a Bayesian 
approach which incorporates potential a priori information about existence and location of 
a change point. Our ultimate goal with this model consists of inferring the change point 
locations p^ for every pixel in the region of interest, a task we discuss next. 

2.2. Identifying Change Points via Expectation-Maximization. To account for missing 
data, we partition the data in year i and pixel v as Xi^ = {Yi^, Zi^) where Yiy are actual 
observed data and Ziy are missing values. The missing entries Ziy can occur at multiple times 
within year i and at multiple spectral bands, and these entries can vary from pixel to pixel. 
We assume that Ziy occur missing at random and represent them as Z = Tvg^ 

the whole collection of missing values in the dataset (and similarly for Y = {Y^y}, the 
observed values.) 

To estimate our parameters of interest 0 = {{py}^^^, a} we select a representative of 






H. GLANZ ET AL./BAY. CH. POINT MOD. LAND COVER IN MODIS DATA 


the posterior distribution P(0 | Y) such as the maximum a posteriori (MAP) estimator 
(5) 0 = argmaxy^ / P(0, Z, VP 11") dZ = argmaxP(0 | y), 

0 ^ J 0 


where W = that is, we marginalize the nuisance parameters Z^, the missing 

values, and the change land class across all pixels v G While a traditional Bayesian 
approach relies on estimating P(0 | y) using Markov chain Monte Carlo (MCMC) meth¬ 
ods (Robert and Casella, 1999; Gelman et ah, 2003), here we adopt an EM routine for 
computational expediency since we anticipate assessing change in large datasets that often 
comprise millions of pixels. Under this setup, we regard both Z and W as latent variables 
and wish to estimate directly the MAP in (5) by following a procedure that starts at some 
arbitrary 0 ^^^ and iteratively updates 

0P+1) = argmax(5(0, 0h)) ;= argmaxE^ ^, y.Q(t) [logP(0, Z, VP, y)] 

IR\ © 0 

(o) 

= argmaxE^^y|y. 0 (p logP(0, Z, VP | y) 

0 ’ I ’ 


until convergence. Function Q computes the expectation (E) step, while the update in ( 6 ) 
performs the maximization (M) step. 

In the spirit of a cyclic gradient descent approach, we alternate between updating the 
“global” parameter ot and then updating change points pv for each pixel v. This procedure is 
similar to a block version of an expectation conditional maximization (ECM) routine (Meng 
and Rubin, 1993). The details are as follows: 

1. Start at arbitrary for example, set = T^k/ k £ and = 

p^ 2 v = J for a-ll pixels v £ M. 

2. For t = 1, 2,... (until convergence) do 

(a) For k £ ^ do: update 


(7) 


a 


(t+i) 


V . u)^;W = A:|W;0W) + 7rfc-l 

^ • P\v 


where = |{r' : p^H < T}| is the number of pixels with changes and 




( 8 ) 


P(VP^, = fc|W;0W) 


4^¥(W|VP^ = fc;0M) 


We note that if we denote by miss(A) and — miss(A) the indices of missing and 
non-missing values in X respectively then 

\W^ = k ~ N{pk, -miss(Ai„)) ^g,-miss(Ai„),-miss{Ai„))) 
which we can use to compute E(yj; | VPy = k; 0 ^) in ( 8 ). 
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(b) For each pixel v in the region of interest do: update py by selecting 

(9) = argmin I ^ S{Xiy-, 

^ I ieBG(p) 

+ E E ¥{Wy = k\Yy-Q^^)S{Xiy-pg,llg) 

i^BG(p) se-r 

-2/(pi < J) = /j|y,;0W)loga('+i) -21ogP(p) i, 

pe-r J 


where 

(10) S{X]Pg, Eg) := log |Sg| + {Xg - ^ g)^ {Xg - ^ g) 

+ E XP),,iy(x-X)),t 

j',/cGmiss(X) 

with Tig := Tp + fqIbt if 5 = -F and Tg := Tg + KqIbt for g € Y’. More 
details about the EM-related variables Xg, an EM-imputed version of X, and 
V{X-, Eg), the conditional variance of ymiss(x) given can be found in 

the Appendix. 

The update in (9) proceeds by first computing the sufficient statistics in (10) for 
every Xiy and g = F and g € Y’ and then systematically spanning the possible 
values of p by including and excluding each year from the background while 
keeping track of the optimal minimum value of the objective in (9). 

We assess convergence by checking if the change in Q between successive iterations is not 
significant, that is, we set a threshold e, say e = 10 “®, and stop when 1 ( 5 ( 0 *-*^^^ 0 ^*^) — 
Q( 0 {t)^ 0 {i-i))| Details on the variables in (10) and derivations of the update equations 
above can be found in the Appendix. However, we can already notice that inferring the 
change point locations py does not involve only imputation of the missing values, as the 
quadratic term with Xg implies; we still need to account for the extra variability that arises 
from the uncertainty in the missing values, as captured by the term with Vg{X). 

In Eigure 2 we show the results of the proposed method in two pixels. In both plots, the 
hollow points are the EM-imputed values X^y, while the mean profile during change, that 

is, for years between piy + 1 and p 2 y, is taken as pcv = Pg* with g* = argmaxP(HA = 

fce'T 

k\Yy;py,dL) the modal land cover class. Both pixels belong to the region studied in the next 
section, where we provide more details about model fit and inference. R code implementing 
this EM routine is available in the Supplementary Material. 

3. Data Analysis and Results. In this section we apply the EM routine from Sec¬ 
tion 2.2 in a simulation study and a case study involving data from the Xingu River Basin 
in the Amazon. 

3.1. Simulation Study. For the model and EM routine described above, we need to 
estimate the parameters of ( 1 ) for each of the land cover classes prevalent in our region of 
interest: the Xingu River Basin in the southeastern part of the Amazon. We characterize 
the regional land cover classes using a set of training sites in South America located in the 
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Fig 3. Batch accuracies for three different methods applied to simulated change and no-change data using the 
metrics outlined in (11). The “90-Thresh” and “95-Thresh” correspond to the method in (Huang and Friedl, 
2014 ) with thresholds of 90% and 95%; “CPD” corresponds to our proposed method. 


Table 2 

Average overall accuracies in each batch, for each of the three methods. 


Missing % 

90-Thresh 

95-Thresh 

CPD 

20 

0.781 

0.794 

0.920 

30 

0.782 

0.796 

0.916 

40 

0.780 

0.793 

0.913 

50 

0.779 

0.792 

0.909 


Olson “Tropical and Subtropical Moist Broadleaf Forests” biome between 0 and 20°S (Friedl 
et al., 2010). Evergreen Broadleaf Forests (class 2) constitute our background (pre- and 
post-change) data. 

Our change point simulation study uses a separate set of training sites to simulate datasets 
consisting of some pixels with a change and some without. That is, we construct new, 
artificial time series profiles based on an independent collection of 100 pixels which contain 
different types of user-identified changes. 

A constructed no-change pixel consists of whole years of data being sampled one year 
at a time from the portion of these 100 pixels identified as “background.” A constructed 
change pixel begins with a randomly generated change point configuration which partitions 
the time series into “background” and “change” periods; then data for these periods are 
sampled again, one year at a time, from the “background” and “change” portions of the 100 
training pixels. A single replication involves simulating 60 no-change pixels and 60 change 
pixels. For each pixel we stitch together 11 years of data. Each annual profile consists of 
data for bands 1 through 7 over 19 time points, as described in Section 2. A single batch 
consists of 100 such replications. To explore the influence of missing data we created data 
for four batches, and induced minimum proportions of missing data of 20%, 30%, 40% and 
50% in each batch respectively. As a basis for comparison, we applied our proposed change 
point method as well as another contemporary method (Huang and Friedl, 2014) to these 
simulated data. 
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To measure the performance of a change point method we consider three metrics: pro¬ 
ducer’s accuracy P (sensitivity, recall), user’s accuracy U (positive predictive value, preci¬ 
sion), and (overall) accuracy A. Given two change point configurations p, as classified by 
the method, and p, the ground truth configuration, each metric is given by: 


( 11 ) 


Uip,p) 


Ef=i/(i0BG(p))J(z0BG(p)) 

Eii/(*0BG(p)) 

E)LiJ(i0BG(p))J(i0BG(p)) 

Eti/(i0BG(p)) 


and 


i ^ /(i 0 BG(p))/(i 0 BG(^) + lii G BG(p))/(i G BG(p)). 


If the denominator in either P or [/ is zero we arbitrarily set them to zero. The boxplots 
in Figure 3 and values in Table 2 snmmarize the three accuracies mentioned above for 
our proposed method as well as the method in (Huang and Friedl, 2014) with thresholds 
of 90% and 95%. In every situation our proposed method out-performs the contemporary 
method at both 90% and 95% thresholds. Furthermore, our method consistently achieves 
high accuracies (>90%) across substantial amounts of missing data. The noticeable dip in 
user’s accuracy (as compared with producer’s and overall) across all methods stems from a 
tendency to identify an excessively long change period. To adapt to this we could consider 
updating our belief about the probability of recovery. After successfully applying our method 
to simulated data, we proceed to detect change in a particular region of the Xingu River 
Basin. 


3.2. Case Study. We apply the EM algorithm described in Section 2.2 to an area (2500 
MODIS pixels, ~134 km?) in the Xingu River Basin, located in the Southeastern part of the 
Amazon in the State of Mato Grosso, Brazil. The study region has several distinct types of 
natural vegetation including moist tropical rainforest, cerrado, and deciduous forest. Despite 
containing substantial area of protected indigenous lands, large areas of the basin’s EBF 
have been converted to agricultural lands for soybean production and cattle ranching since 
2000 (Huang and Eriedl, 2014). 

To avoid spurious results, we do not consider IGBP classes that are not native to the 
study area: 1 (evergreen needleleaf forests), 3 (deciduous needleleaf forests), and 4 (deciduous 
broadleaf forest), 11 (permanent wetlands), 13 (urban and built-up), and 15 (snow and ice). 
Thus, only IGBP classes 5 (MXE), 6 (GSH), 7 (OSH), 8 (WSA), 9 (SAV), 10 (GRA), 
12 (GRL), 14 (GRM), 16 (BAR), and 17 (WAT) are assumed as possible change classes, 
while IGBP class 2, EBF, is taken as the background class. For the analysis we assnmed that 
vTo = 10“^*^, TTjj = 0.01, and that kq = Kc = 5 ■ 10^ which is roughly 1/5 of the data variance 
in the classes. The very stringent value for the probability of change ttq aims at providing 
a more robust change point inference against outliers. As the probability of recovery vr/j 
suggests, we expect that a priori approximately 1% of the changed pixels actually recover. 

To assess our results, we used a high quality Landsat-based deforestation dataset called 
PRODES (Monitoring the Brazilian Amazon Gross Deforestation), produced by Brazil’s 
National Institute for Space Research (INPE) (INPE, 2012). We derived annual sub-pixel 
fractions of deforestation and the year of change at MODIS spatial resolution (see (Huang 
and Friedl, 2014) for details). In particular, to evaluate the performance of our method, for 
each pixel u G ^ we compare the estimated change segmentation given by py to reference 
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pixel X 


pixel X 


Concordance 


Fig 4. Results of change point analysis in the Xingu River study region. Top row: estimated change points 
piv in the leftmost panel (darker shades mean earlier changes), conditional probabilities of no change in the 
rightmost panels (darker shades represent smaller probabilities.) Bottom row, left to right: ground-truth ref¬ 
erence (darker shades code for higher deforestation), concordance with estimated change point configurations 
(darker shades capture lower concordance), and distribution of concordance values across pixels. 


deforestation percentages fy using a measure of concordance C: 

(12) C{py, fy) e BG(p.))(1 - fiy) + I(i 0 BG{py))fiy. 

1=1 

We note that this measure can be seen as an expected accuracy if we regard fiy as the 
probability of the i-th reference year not being in the background state. 

Figure 4 summarizes the inferred changes. In the top left panel we plot the estimated 
change year for each pixel piy at the end of the EM procedure for pixel v. Darker grays 
represent earlier changes and white, in particular, codes for piy = J, i.e.no change. The 
two top rightmost panels show the conditional probability of no change, that is, IP(/Oi^ = 
J\Yy;Q), with darker shades representing smaller probabilities; as can be seen from the 
contrast in the spatial pattern and the boxplot, the changes are very accentuated within 
clusters. The bottom panels show that the inferred change points are in very good agreement 
with the ground-truth reference: the leftmost panel plots maxj=i^...^j fiy, with darker shades 
representing higher levels of deforestation; the middle panel plots the concordance measure 
in (12), darker shades coding for lower concordance values to highlight contrasts; and the 
rightmost panel illustrating the distribution of concordance values across pixels. As we can 
see, concordance is overall high and the low values are concentrated either at the borders 
of change clusters or at small change “islands” (clusters.) 
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Fig 5. Results of change point analysis in the Xingu River study region. Top row: estimated change points, 
land cover class compositions (top panels), and projected profiles (bottom panels) for two representative 
pixels in the study region. In top panels: probability of first change point with change weights given in color 
according to land cover class (see legend), dashed line marks probability of no change, dark gray bar marks 
probability of second change to background class (recovery), light gray background represents deforestation 
percentages from ground-truth reference; in bottom panels: PC-projected spectro-temporal data profiles, with 
hollow points marking EM-imputed values, solid lines representing mean land cover profiles, and dashed lines 
mean profiles for background class. Bottom-left panel: probabilities of land cover changes (bars) given that 
change has occurred; jittered points highlight the same probabilities but when these probabilities are maximized 
for the respective change class. Bottom-right panel: overall land cover classification based on inferred change. 
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An important feature of our model is to not only represent changes but to also charac¬ 
terize these changes according to land cover classes. As an example, the top row of Figure 5 
illustrates the results for two typical pixels in the study region. In each panel, the top plot 
shows the probability of change points, 11^; 0), further stratified by class probabilities 
P(VFy I 0) in a colored bar, F{p 2 v \ Yy] 0) in a dark gray bar if positive, and the defor¬ 
estation percentages from the ground-truth reference in light gray in the background. The 
dashed line represents the probability of no change. The bottom plot depicts PC-projected 
NEAR values as in (2) with K = 3 with EM-imputed values in hollow points; the solid 
lines in each year outline the projected mean profile for the inferred class in the year, while 
dashed lines represent background (EBF) yearly profiles. As we can see from the left panel, 
the changed class profile fits the data reasonably well, and hence the high probability of 
change at year 7; on the other hand, in the right panel the projected data does not seem 
to follow class profiles closely and so the no-change probability is closer to the now smaller 
change probability and the class to which the pixel changed is less certain. 

The bottom row portrays to which land cover classes pixels change (left panel, posterior 
conditional probabilities P(IEy | Yy,piy > J)) and how these change classes are distributed 
spatially in the study region (right panel.) In the left barplot, we see that the most common 
change classes are, in order, savannas (IGBP class 9), woody savannas (IGBP 8), grasslands 
(IGBP 10), and croplands (IGBP 12.) The jittered gray points highlight the probability 
of changing to each class C when P(ITy | Yy,piy > J) is maximized at C. The right panel 
displays the study region with each pixel colored by either the background EBF class if 
there is no inferred change, or by the class that maximizes ¥{Wy \ Yy,piy > J) in case of 
change. Thus, the panel contains the same spatial patterns as in the top two leftmost panels 
in Figure 4, but it adds a characterization of change according to land cover. 

4. Discussion. As the simulation study in Section 3.1 indicates, the proposed model 
and EM inferential routine yield better results than a state-of-the-art alternative method. 
Gur better performance can be explained mainly by three factors: first, our proposed model 
incorporates data from all bands, instead of relying on particular bands or combined statis¬ 
tics (e.g.NDVI and EVI (Myneni et ah, 1995; Huete et ah, 2002)); missing data is ubiquitous 
in remote sensing and while many methods depend on extraneous gap-filling procedures, our 
method accommodates missing data consistently with our model via expectation; finally, 
our model is more flexible since we allow for at most two change points to capture recovery 
from change. 

Our proposed methodology has also performed well in the real-world case study in Sec¬ 
tion 3.2. The results are in very good agreement with the ground-truth reference. Interest¬ 
ingly, as we can see in Figure 4, the inferred changes seem to follow a clear spatial pattern 
usually going northwest to southeast and operating on clusters; this effect is reassuring since 
the model makes no provisions for spatial interactions and so the pattern is fortuitous. Con¬ 
cordance is generally high over the whole study region with low concordance pixels being 
localized to change cluster borders—which we attribute to pixels with mixed class compo¬ 
sitions due to transitions from background (see, for example, (Jin and Sader, 2005; Lunetta 
et ah, 2006))— or to small clusters. These small clusters capture larger discrepancies with 
the reference about the existence of change and/or deforestation. 

The two exemplar pixels in Figure 5 highlight the two major types of discrepancies to 
the ground-truth reference that lead to lower concordance values in Figure 4. In the top 
left panel we have a low deforestation percentage but high estimated probability of change 
at year 7 to savanna; this pixel belongs to the small cluster in the southeast corner of 
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the region. Given that the spectro-temporal profile for savanna is similar to the profile for 
evergreen broadleaf forests (EBF, the background land cover class), and that savannas have 
from 10 to 30% of forest canopy cover, it is reasonable to confuse this land cover class with 
a low deforestation profile. In the top right panel we summarize the results from the EM 
method for a pixel in the southern border of the big change cluster in the middle of the 
study region. In this case, the class fragmentation at the change year, year 5, and possible 
recovery at year 8 can be attributed to deforestation and/or degradation at sub-pixel scale. 

As we can see in the bottom left panel of Figure 5, in the Xingu River region case study 
most land cover classes in the estimated change segments are woody savannas, savannas, 
grasslands, or croplands (IGBP classes 8, 9, 10, and 12, respectively). Groplands and grass¬ 
lands are often found in regions with earlier change points (darker regions in the top left 
plot in Figure 4), and might correspond to new land uses such as soy plantations and cattle 
ranching farms. In contrast, later change point regions are often classified as woody savan¬ 
nas, which have higher canopy density and might signal recent deforestation. Savannas have 
lower canopy density and are localized to either border pixels, as a transition land cover 
class, or to isolated islands; these smaller stranded regions could correspond to degradation 
areas, a more veiled form of deforestation. Interestingly, most discrepancies to the reference 
deforestation percentages overlap with this land cover class; this can be explained by lower 
deforestation percentages in these regions. 

5. Conclusion. Detecting changes in land cover can provide crucial information for 
land use policy, natural resource management, and ecosystem modeling efforts. Remote 
sensing offers a spectrally and temporally rich source of data with which to make inference 
about changes in land cover at broad spatial scales. Unfortunately, missing values pervade 
most datasets for a multitude of reasons. 

In this article we proposed a hierarchical model for identifying conversion-type changes in 
MODIS time series which accounts for missing data. The collection of MODIS training sites 
for the IGBP classification scheme is extensive and provides a useful resource for character¬ 
izing these high-dimensional data. We use these training data to estimate model parameters 
for 11 IGBP land cover classes including our background class: Evergreen Broadleaf For¬ 
est. With these estimates in hand we proceed to analyze pixels independently with an EM 
algorithm to detect the presence or lack of change points. The change points we identify 
characterize distributional changes from EBF to one of the other IGBP land cover classes 
present in our training dataset. 

Not only can our approach identify change points, but the posteriors in (8) can be used 
to informally assess what class or classes the changed data represent. The methodology 
we propose here has two distinctive features: first, while our method is probably best used 
to find abrupt changes in time series, such as disturbances, it is flexible enough to handle 
gradual changes by suitably defining change probabilities tto and ttr and fitting class prob¬ 
abilities a; moreover, the methodology we propose allows for recovery from change. These 
two important features are essential to characterizing and interpreting changes and are, in 
particular, essential to remote sensing applications. We note that hyper-prior parameters ttq 
and TT/j control how robust the method is to outliers and should be carefully elicited based 
on similar study regions. 

In general, our EM algorithm could be used successfully on data or land cover displaying 
a conversion-type change. To accommodate other types of disturbances such fire and log¬ 
ging, our model would require exemplars from these situations. That is, we would need to 
characterize the surface after a fire or after logging in the parameter estimates of (1) (i.e. 
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training data for “post-fire” or “post-logging”) in order to detect these kinds of changes in 
new pixels. 

We demonstrated the effectiveness of our method with a simulation study and a case study 
in the Xingu River Basin. Our results indicate that our method performs better than state- 
of-the-art methods and has high concordance to ground-truth references. More specifically, 
we recovered nicely the spatial and temporal configuration of changes in the study regions 
and were able to interpret the changes by their inferred land cover classes and spatial 
localization. Overall, our method produced satisfying results and should be considered for 
detecting conversion-type changes in remotely sensed time series that contain missing data. 

As future work we intend to extend this method to formally account for changes in 
space, that is, not only in time, and to investigate an alternative estimator for change 
configurations that maximizes the posterior expected accuracy, that is, to define pA '■= 
argmaxEpi y [A(p, pi)] and devise a computationally efficient method to obtain pA- 
p 
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APPENDIX A: EXPEGTATION-MAXIMIZATION DERIVATIONS 


To derive the EM updates in Section 2.2 we need 


Q(0,0W) = 

®Z,IT| V;0(‘) [logP(0,Z,IT,y) 


“ '^z,w\Y-,eW 


10gP(X„, W I Wy,Py) + 10gP(/9^) 


< J)logP(HA I q)-L logP(Q:) , 


as defined in (6). The indicator I{piv < J) filters pixels that have at least one change. To 
derive the conditional updates for a and py for each v € ^ we identify two functions that 
capture the terms in Q that depend on a, 




(13) 


v:piy<J 


logP(H4 I ct) + logP(Q;) 

.v.pi^<J 

log P(H4 I ot) + log P(q!) 


and on py at pixel v, 

Qp,yi@,&^^^) = \Og¥{Zy,Yy I Wy, Py) +\Og¥ {py 

+ I {ply < J) log F{Wy I Ol) 


(14) 


= I Wy,py)] -blogE(/9^) 

+ Iipiy < J)E^^^^^|^^^^(o[logE(W,|a)]. 


Note that <5(0, 0^*^) = Qp^y{Q, + logE(Q:) and that the term I{piy < J)¥{Wy \ a) 
is shared between Qa and Qp^y. 
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Updating a. Let us start with the ct-update in Step 2.a; we need to optimize Q with 
respect to a subject to the constraint h{oL) = Og — 1 = 0. To this end, we define a 

Lagrange multiplier A and solve 


d 

dak 


Q„( 0 , 0 W)-A/i(q) 


d 

dak 




I{W, = g) log 


CXn 




OLq 


and so, fixing /9„ to its value in the previous iteration, pj, , we get the update in (7), 


(t) 




= 0 , 


a 


p+l) _ TTfc - 1 + - k)] 

~ A 

E,.,pW^jnWv = k\Y,- 0W) + TTfc - 1 

" = 9 1 0W) + vr, -1] 

^ I + ^k-i 

_ ^-Plv _ 

+ Zlge'^^9 “ 1^1 

where Ege-^ = g\Yy;e^d) = is the number of pixels 

with changes. To compute the update we just need the expression in (8), for k G 


F{Wy = A;|y^;0W) = 


P(y, I TU; = A:; 0W)P(TU; = k; 0^) 

Ege^^nYv \W, = g-, 0W)P(fT. = p; 0W) 

^ W, = fc;0M) 

~ Ege‘ff»?nyv\w, = g-,e(d)' 

Updating pv In Step 2.b we fix a and update the remaining parameters in 0. We 
update them jointly, but in parallel for each pixel. The last term in Qp^y is already known 
from the last section, and we condition a to its recently updated value 


(15) 




logP(IK|a)j = = p| W;0W)logaJ+i). 


Now we just need to obtain 
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where T,p = T,p + kqIbt, ^9 = + HcIbt for g £ as in the main text, and, if 

X ~ N{g,,'E,) with missing entries at indices miss. 


5(X; g, S) := log |E| + [(X - fi)'^^-\X - g) 

To evaluate 5, we need 


E 


v,,iss I v_ 


(X - g^^-HX - g)] = [tr{(X - g)^T.-\X - g)} 


= E 
= tr 


|S“^E 


' v I y 
^miss I ^ —n 


{X-g){X-gf]] 


,T 


= {X-g) S-^(X-/x) 


+ tr 


{e-> 


Var 


Vn,iss I V_, 




since, with X = x_^.^jX], we have the Pythagorean relationship 


E 




iX-g){X-g) 


T 


= E 


I v_ 


+ {X-g){X-g) 


(X-X)(X-X) 

T 


T 


= '^^^x^..\x.^jX] + {X-g){X-g) . 

Let us denote hyV{X; E) := | Clearly, X.^iss = X.^iss and so V (X; = 

0 wherever j 0 miss or k ^ miss. The remaining entries in X and V{X] S) are known from 


-^miss I miss 


^Mmiss T (S_fYijss iT^jss) (5^ —miss,—miss) (^—miss /^—miss)? 


-'miss,miss 


-(S- 


)T(E_ 


miss,miss 


miss,—miss) miss,miss 


Thus, 


5(X;/r,S) = log|S| + (X-/i)^S-i(X-/r)+ J] (S-i)^.,P(X; S)^.„ 

j,k£m\ss 


which yields the dehnition in ( 10 ). 

Finally, putting together (15) and (16) in the definition of Qp,v, and since argmaxQp^^ = 

p 

argmin —2Qp_.y, we have the update expression in (9). 

p 
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